Overview

Dataset statistics

Number of variables42
Number of observations199523
Missing cells415717
Missing cells (%)5.0%
Duplicate rows3229
Duplicate rows (%)1.6%
Total size in memory390.5 MiB
Average record size in memory2.0 KiB

Variable types

CAT34
NUM8

Reproduction

Analysis started2020-02-23 16:28:22.592393
Analysis finished2020-02-23 16:33:30.831925
Versionpandas-profiling v2.5.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
Dataset has 3229 (1.6%) duplicate rows Duplicates
detailed_industry_recode has a high cardinality: 52 distinct values High cardinality
major_industry_code is highly correlated with detailed_industry_recodeHigh Correlation
detailed_industry_recode is highly correlated with major_industry_codeHigh Correlation
major_occupation_code is highly correlated with detailed_occupation_recodeHigh Correlation
detailed_occupation_recode is highly correlated with major_occupation_codeHigh Correlation
region_of_previous_residence is highly correlated with tax_filer_stat and 1 other fieldsHigh Correlation
tax_filer_stat is highly correlated with region_of_previous_residenceHigh Correlation
detailed_household_and_family_stat is highly correlated with state_of_previous_residenceHigh Correlation
state_of_previous_residence is highly correlated with detailed_household_and_family_statHigh Correlation
live_in_this_house_1_year_ago is highly correlated with migration_code-change_in_msa and 3 other fieldsHigh Correlation
migration_code-change_in_msa is highly correlated with live_in_this_house_1_year_agoHigh Correlation
migration_code-change_in_reg is highly correlated with live_in_this_house_1_year_agoHigh Correlation
migration_code-move_within_reg is highly correlated with live_in_this_house_1_year_agoHigh Correlation
migration_prev_res_in_sunbelt is highly correlated with region_of_previous_residenceHigh Correlation
year is highly correlated with live_in_this_house_1_year_agoHigh Correlation
migration_code-change_in_msa has 99696 (50.0%) missing values Missing
migration_code-change_in_reg has 99696 (50.0%) missing values Missing
migration_code-move_within_reg has 99696 (50.0%) missing values Missing
migration_prev_res_in_sunbelt has 99696 (50.0%) missing values Missing
country_of_birth_father has 6713 (3.4%) missing values Missing
country_of_birth_mother has 6119 (3.1%) missing values Missing
country_of_birth_self has 3393 (1.7%) missing values Missing
dividends_from_stocks is highly skewed (γ1 = 27.78650179) Skewed
age has 2839 (1.4%) zeros Zeros
wage_per_hour has 188219 (94.3%) zeros Zeros
capital_gains has 192144 (96.3%) zeros Zeros
capital_losses has 195617 (98.0%) zeros Zeros
dividends_from_stocks has 178382 (89.4%) zeros Zeros
num_persons_worked_for_employer has 95983 (48.1%) zeros Zeros
weeks_worked_in_year has 95983 (48.1%) zeros Zeros

Variables

age
Real number (ℝ≥0)

ZEROS
Distinct count91
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean34.494198663813194
Minimum0
Maximum90
Zeros2839
Zeros (%)1.4%
Memory size1.5 MiB

Quantile statistics

Minimum0
5-th percentile3
Q115
median33
Q350
95-th percentile75
Maximum90
Range90
Interquartile range (IQR)35

Descriptive statistics

Standard deviation22.31089521
Coefficient of variation (CV)0.6468013774
Kurtosis-0.7328243009
Mean34.49419866
Median Absolute Deviation (MAD)18.53522723
Skewness0.3732904573
Sum6882386
Variance497.7760449
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 8.5 14.5 17.5 ... 84.5 85.5 87.5 89.5 90. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
34 3489 1.7%
 
35 3450 1.7%
 
36 3353 1.7%
 
31 3351 1.7%
 
33 3340 1.7%
 
5 3332 1.7%
 
4 3318 1.7%
 
3 3279 1.6%
 
37 3278 1.6%
 
38 3277 1.6%
 
Other values (81) 166056 83.2%
 
ValueCountFrequency (%) 
0 2839 1.4%
 
1 3138 1.6%
 
2 3236 1.6%
 
3 3279 1.6%
 
4 3318 1.7%
 
ValueCountFrequency (%) 
90 725 0.4%
 
89 195 0.1%
 
88 241 0.1%
 
87 301 0.2%
 
86 348 0.2%
 

class_of_worker
Categorical

Distinct count9
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe
100245
Private
72028
Self-employed-not incorporated
 
8445
Local government
 
7784
State government
 
4227
Other values (4)
 
6794
ValueCountFrequency (%) 
Not in universe 100245 50.2%
 
Private 72028 36.1%
 
Self-employed-not incorporated 8445 4.2%
 
Local government 7784 3.9%
 
State government 4227 2.1%
 
Self-employed-incorporated 3265 1.6%
 
Federal government 2925 1.5%
 
Never worked 439 0.2%
 
Without pay 165 0.1%
 

Length

Max length31
Mean length14.02115546
Min length8
ValueCountFrequency (%) 
Lowercase_Letter 21 72.4%
 
Uppercase_Letter 6 20.7%
 
Space_Separator 1 3.4%
 
Dash_Punctuation 1 3.4%
 
ValueCountFrequency (%) 
Latin 27 93.1%
 
Common 2 6.9%
 
ValueCountFrequency (%) 
ASCII 29 100.0%
 

detailed_industry_recode
Categorical

HIGH CARDINALITY
HIGH CORRELATION
Distinct count52
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size197.9 KiB
0
100684
33
 
17070
43
 
8283
4
 
5984
42
 
4683
Other values (47)
62819
ValueCountFrequency (%) 
0 100684 50.5%
 
33 17070 8.6%
 
43 8283 4.2%
 
4 5984 3.0%
 
42 4683 2.3%
 
45 4482 2.2%
 
29 4209 2.1%
 
37 4022 2.0%
 
41 3964 2.0%
 
32 3596 1.8%
 
Other values (42) 42546 21.3%
 

Length

Max length2
Mean length1.432015357
Min length1
ValueCountFrequency (%) 
Decimal_Number 10 100.0%
 
ValueCountFrequency (%) 
Common 10 100.0%
 
ValueCountFrequency (%) 
ASCII 10 100.0%
 

detailed_occupation_recode
Categorical

HIGH CORRELATION
Distinct count47
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size196.6 KiB
0
100684
2
 
8756
26
 
7887
19
 
5413
29
 
5105
Other values (42)
71678
ValueCountFrequency (%) 
0 100684 50.5%
 
2 8756 4.4%
 
26 7887 4.0%
 
19 5413 2.7%
 
29 5105 2.6%
 
36 4145 2.1%
 
34 4025 2.0%
 
10 3683 1.8%
 
16 3445 1.7%
 
23 3392 1.7%
 
Other values (37) 52988 26.6%
 

Length

Max length2
Mean length1.401277046
Min length1
ValueCountFrequency (%) 
Decimal_Number 10 100.0%
 
ValueCountFrequency (%) 
Common 10 100.0%
 
ValueCountFrequency (%) 
ASCII 10 100.0%
 

education
Categorical

Distinct count17
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
High school graduate
48407
Children
47422
Some college but no degree
27820
Bachelors degree(BA AB BS)
19865
7th and 8th grade
 
8007
Other values (12)
48002
ValueCountFrequency (%) 
High school graduate 48407 24.3%
 
Children 47422 23.8%
 
Some college but no degree 27820 13.9%
 
Bachelors degree(BA AB BS) 19865 10.0%
 
7th and 8th grade 8007 4.0%
 
10th grade 7557 3.8%
 
11th grade 6876 3.4%
 
Masters degree(MA MS MEng MEd MSW MBA) 6541 3.3%
 
9th grade 6230 3.1%
 
Associates degree-occup /vocational 5358 2.7%
 
Other values (7) 15440 7.7%
 

Length

Max length39
Mean length19.86398561
Min length9
ValueCountFrequency (%) 
Lowercase_Letter 19 40.4%
 
Uppercase_Letter 13 27.7%
 
Decimal_Number 10 21.3%
 
Open_Punctuation 1 2.1%
 
Other_Punctuation 1 2.1%
 
Close_Punctuation 1 2.1%
 
Space_Separator 1 2.1%
 
Dash_Punctuation 1 2.1%
 
ValueCountFrequency (%) 
Latin 32 68.1%
 
Common 15 31.9%
 
ValueCountFrequency (%) 
ASCII 47 100.0%
 

wage_per_hour
Real number (ℝ≥0)

ZEROS
Distinct count1240
Unique (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean55.426908175999756
Minimum0
Maximum9999
Zeros188219
Zeros (%)94.3%
Memory size1.5 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile495
Maximum9999
Range9999
Interquartile range (IQR)0

Descriptive statistics

Standard deviation274.8964539
Coefficient of variation (CV)4.959620931
Kurtosis155.2188969
Mean55.42690818
Median Absolute Deviation (MAD)104.5737349
Skewness8.935096531
Sum11058943
Variance75568.06037
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 10. 195. 202.5 212.5 ... 2506. 2812.5 3325. 5512.5 9999. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 188219 94.3%
 
500 734 0.4%
 
600 546 0.3%
 
700 534 0.3%
 
800 507 0.3%
 
1000 386 0.2%
 
425 376 0.2%
 
900 336 0.2%
 
550 280 0.1%
 
1200 256 0.1%
 
Other values (1230) 7349 3.7%
 
ValueCountFrequency (%) 
0 188219 94.3%
 
20 1 < 0.1%
 
70 1 < 0.1%
 
75 2 < 0.1%
 
100 11 < 0.1%
 
ValueCountFrequency (%) 
9999 1 < 0.1%
 
9916 1 < 0.1%
 
9800 2 < 0.1%
 
9400 2 < 0.1%
 
9000 1 < 0.1%
 
Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe
186943
High school
 
6892
College or university
 
5688
ValueCountFrequency (%) 
Not in universe 186943 93.7%
 
High school 6892 3.5%
 
College or university 5688 2.9%
 

Length

Max length22
Mean length16.03287842
Min length12
ValueCountFrequency (%) 
Lowercase_Letter 14 77.8%
 
Uppercase_Letter 3 16.7%
 
Space_Separator 1 5.6%
 
ValueCountFrequency (%) 
Latin 17 94.4%
 
Common 1 5.6%
 
ValueCountFrequency (%) 
ASCII 18 100.0%
 

marital_stat
Categorical

Distinct count7
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Never married
86485
Married-civilian spouse present
84222
Divorced
 
12710
Widowed
 
10463
Separated
 
3460
Other values (2)
 
2183
ValueCountFrequency (%) 
Never married 86485 43.3%
 
Married-civilian spouse present 84222 42.2%
 
Divorced 12710 6.4%
 
Widowed 10463 5.2%
 
Separated 3460 1.7%
 
Married-spouse absent 1518 0.8%
 
Married-A F spouse present 665 0.3%
 

Length

Max length32
Mean length20.99977947
Min length8
ValueCountFrequency (%) 
Lowercase_Letter 17 65.4%
 
Uppercase_Letter 7 26.9%
 
Space_Separator 1 3.8%
 
Dash_Punctuation 1 3.8%
 
ValueCountFrequency (%) 
Latin 24 92.3%
 
Common 2 7.7%
 
ValueCountFrequency (%) 
ASCII 26 100.0%
 

major_industry_code
Categorical

HIGH CORRELATION
Distinct count24
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe or children
100684
Retail trade
 
17070
Manufacturing-durable goods
 
9015
Education
 
8283
Manufacturing-nondurable goods
 
6897
Other values (19)
57574
ValueCountFrequency (%) 
Not in universe or children 100684 50.5%
 
Retail trade 17070 8.6%
 
Manufacturing-durable goods 9015 4.5%
 
Education 8283 4.2%
 
Manufacturing-nondurable goods 6897 3.5%
 
Finance insurance and real estate 6145 3.1%
 
Construction 5984 3.0%
 
Business and repair services 5651 2.8%
 
Medical except hospital 4683 2.3%
 
Public administration 4610 2.3%
 
Other values (14) 30501 15.3%
 

Length

Max length36
Mean length24.39614982
Min length7
ValueCountFrequency (%) 
Lowercase_Letter 21 55.3%
 
Uppercase_Letter 15 39.5%
 
Space_Separator 1 2.6%
 
Dash_Punctuation 1 2.6%
 
ValueCountFrequency (%) 
Latin 36 94.7%
 
Common 2 5.3%
 
ValueCountFrequency (%) 
ASCII 38 100.0%
 

major_occupation_code
Categorical

HIGH CORRELATION
Distinct count15
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe
100684
Adm support including clerical
 
14837
Professional specialty
 
13940
Executive admin and managerial
 
12495
Other service
 
12099
Other values (10)
45468
ValueCountFrequency (%) 
Not in universe 100684 50.5%
 
Adm support including clerical 14837 7.4%
 
Professional specialty 13940 7.0%
 
Executive admin and managerial 12495 6.3%
 
Other service 12099 6.1%
 
Sales 11783 5.9%
 
Precision production craft & repair 10518 5.3%
 
Machine operators assmblrs & inspctrs 6379 3.2%
 
Handlers equip cleaners etc 4127 2.1%
 
Transportation and material moving 4020 2.0%
 
Other values (5) 8641 4.3%
 

Length

Max length38
Mean length20.76417756
Min length6
ValueCountFrequency (%) 
Lowercase_Letter 22 64.7%
 
Uppercase_Letter 10 29.4%
 
Other_Punctuation 1 2.9%
 
Space_Separator 1 2.9%
 
ValueCountFrequency (%) 
Latin 32 94.1%
 
Common 2 5.9%
 
ValueCountFrequency (%) 
ASCII 34 100.0%
 

race
Categorical

Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
White
167365
Black
 
20415
Asian or Pacific Islander
 
5835
Other
 
3657
Amer Indian Aleut or Eskimo
 
2251
ValueCountFrequency (%) 
White 167365 83.9%
 
Black 20415 10.2%
 
Asian or Pacific Islander 5835 2.9%
 
Other 3657 1.8%
 
Amer Indian Aleut or Eskimo 2251 1.1%
 

Length

Max length28
Mean length6.833096936
Min length6
ValueCountFrequency (%) 
Lowercase_Letter 16 66.7%
 
Uppercase_Letter 7 29.2%
 
Space_Separator 1 4.2%
 
ValueCountFrequency (%) 
Latin 23 95.8%
 
Common 1 4.2%
 
ValueCountFrequency (%) 
ASCII 24 100.0%
 

hispanic_origin
Categorical

Distinct count10
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
All other
171907
Mexican-American
 
8079
Mexican (Mexicano)
 
7234
Central or South American
 
3895
Puerto Rican
 
3313
Other values (5)
 
5095
ValueCountFrequency (%) 
All other 171907 86.2%
 
Mexican-American 8079 4.0%
 
Mexican (Mexicano) 7234 3.6%
 
Central or South American 3895 2.0%
 
Puerto Rican 3313 1.7%
 
Other Spanish 2485 1.2%
 
Cuban 1126 0.6%
 
NA 874 0.4%
 
Do not know 306 0.2%
 
Chicano 304 0.2%
 

Length

Max length26
Mean length10.9685099
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 18 58.1%
 
Uppercase_Letter 9 29.0%
 
Open_Punctuation 1 3.2%
 
Close_Punctuation 1 3.2%
 
Space_Separator 1 3.2%
 
Dash_Punctuation 1 3.2%
 
ValueCountFrequency (%) 
Latin 27 87.1%
 
Common 4 12.9%
 
ValueCountFrequency (%) 
ASCII 31 100.0%
 

sex
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Female
103984
Male
95539
ValueCountFrequency (%) 
Female 103984 52.1%
 
Male 95539 47.9%
 

Length

Max length7
Mean length6.042325947
Min length5
ValueCountFrequency (%) 
Lowercase_Letter 4 57.1%
 
Uppercase_Letter 2 28.6%
 
Space_Separator 1 14.3%
 
ValueCountFrequency (%) 
Latin 6 85.7%
 
Common 1 14.3%
 
ValueCountFrequency (%) 
ASCII 7 100.0%
 
Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe
180459
No
 
16034
Yes
 
3030
ValueCountFrequency (%) 
Not in universe 180459 90.4%
 
No 16034 8.0%
 
Yes 3030 1.5%
 

Length

Max length16
Mean length14.77306376
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 9 75.0%
 
Uppercase_Letter 2 16.7%
 
Space_Separator 1 8.3%
 
ValueCountFrequency (%) 
Latin 11 91.7%
 
Common 1 8.3%
 
ValueCountFrequency (%) 
ASCII 12 100.0%
 
Distinct count6
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe
193453
Other job loser
 
2038
Re-entrant
 
2019
Job loser - on layoff
 
976
Job leaver
 
598
ValueCountFrequency (%) 
Not in universe 193453 97.0%
 
Other job loser 2038 1.0%
 
Re-entrant 2019 1.0%
 
Job loser - on layoff 976 0.5%
 
Job leaver 598 0.3%
 
New entrant 439 0.2%
 

Length

Max length22
Mean length15.9549676
Min length11
ValueCountFrequency (%) 
Lowercase_Letter 17 73.9%
 
Uppercase_Letter 4 17.4%
 
Space_Separator 1 4.3%
 
Dash_Punctuation 1 4.3%
 
ValueCountFrequency (%) 
Latin 21 91.3%
 
Common 2 8.7%
 
ValueCountFrequency (%) 
ASCII 23 100.0%
 
Distinct count8
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Children or Armed Forces
123769
Full-time schedules
40736
Not in labor force
26808
PT for non-econ reasons usually FT
 
3322
Unemployed full-time
 
2311
Other values (3)
 
2577
ValueCountFrequency (%) 
Children or Armed Forces 123769 62.0%
 
Full-time schedules 40736 20.4%
 
Not in labor force 26808 13.4%
 
PT for non-econ reasons usually FT 3322 1.7%
 
Unemployed full-time 2311 1.2%
 
PT for econ reasons usually PT 1209 0.6%
 
Unemployed part- time 843 0.4%
 
PT for econ reasons usually FT 525 0.3%
 

Length

Max length35
Mean length23.33263834
Min length19
ValueCountFrequency (%) 
Lowercase_Letter 18 66.7%
 
Uppercase_Letter 7 25.9%
 
Space_Separator 1 3.7%
 
Dash_Punctuation 1 3.7%
 
ValueCountFrequency (%) 
Latin 25 92.6%
 
Common 2 7.4%
 
ValueCountFrequency (%) 
ASCII 27 100.0%
 

capital_gains
Real number (ℝ≥0)

ZEROS
Distinct count132
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean434.7189897906507
Minimum0
Maximum99999
Zeros192144
Zeros (%)96.3%
Memory size1.5 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum99999
Range99999
Interquartile range (IQR)0

Descriptive statistics

Standard deviation4697.53128
Coefficient of variation (CV)10.8059031
Kurtosis393.0628325
Mean434.7189898
Median Absolute Deviation (MAD)837.3298939
Skewness18.99082234
Sum86736437
Variance22066800.12
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.00000e+00 5.70000e+01 4.97500e+02 7.54000e+02 9.52500e+02 ... 2.35820e+04 3.09615e+04 3.77025e+04 7.06545e+04 9.99990e+04], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 192144 96.3%
 
15024 788 0.4%
 
7688 609 0.3%
 
7298 582 0.3%
 
99999 390 0.2%
 
3103 237 0.1%
 
5178 207 0.1%
 
5013 158 0.1%
 
4386 151 0.1%
 
3325 121 0.1%
 
Other values (122) 4136 2.1%
 
ValueCountFrequency (%) 
0 192144 96.3%
 
114 11 < 0.1%
 
401 33 < 0.1%
 
594 88 < 0.1%
 
914 17 < 0.1%
 
ValueCountFrequency (%) 
99999 390 0.2%
 
41310 2 < 0.1%
 
34095 11 < 0.1%
 
27828 94 < 0.1%
 
25236 23 < 0.1%
 

capital_losses
Real number (ℝ≥0)

ZEROS
Distinct count113
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean37.313788385298935
Minimum0
Maximum4608
Zeros195617
Zeros (%)98.0%
Memory size1.5 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum4608
Range4608
Interquartile range (IQR)0

Descriptive statistics

Standard deviation271.8964284
Coefficient of variation (CV)7.286754847
Kurtosis61.63293305
Mean37.31378839
Median Absolute Deviation (MAD)73.1666158
Skewness7.6325647
Sum7444959
Variance73927.66776
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 77.5 184. 639. 1198. ... 2713. 2914. 3835. 4128. 4608. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 195617 98.0%
 
1902 407 0.2%
 
1977 381 0.2%
 
1887 364 0.2%
 
1602 193 0.1%
 
2415 122 0.1%
 
1485 95 < 0.1%
 
1848 88 < 0.1%
 
1876 87 < 0.1%
 
1672 85 < 0.1%
 
Other values (103) 2084 1.0%
 
ValueCountFrequency (%) 
0 195617 98.0%
 
155 1 < 0.1%
 
213 10 < 0.1%
 
323 10 < 0.1%
 
419 29 < 0.1%
 
ValueCountFrequency (%) 
4608 4 < 0.1%
 
4356 30 < 0.1%
 
3900 2 < 0.1%
 
3770 5 < 0.1%
 
3683 4 < 0.1%
 

dividends_from_stocks
Real number (ℝ≥0)

SKEWED
ZEROS
Distinct count1478
Unique (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean197.52953293605248
Minimum0
Maximum99999
Zeros178382
Zeros (%)89.4%
Memory size1.5 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile400
Maximum99999
Range99999
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1984.163658
Coefficient of variation (CV)10.04489622
Kurtosis1090.563754
Mean197.5295329
Median Absolute Deviation (MAD)364.2557707
Skewness27.78650179
Sum39411685
Variance3936905.423
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.00000e+00 5.00000e-01 1.50000e+00 2.50000e+00 3.50000e+00 ... 4.91825e+04 4.99995e+04 5.00550e+04 9.75470e+04 9.99990e+04], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 178382 89.4%
 
100 1148 0.6%
 
500 1030 0.5%
 
1000 894 0.4%
 
200 866 0.4%
 
50 832 0.4%
 
2000 574 0.3%
 
250 555 0.3%
 
150 549 0.3%
 
300 523 0.3%
 
Other values (1468) 14170 7.1%
 
ValueCountFrequency (%) 
0 178382 89.4%
 
1 472 0.2%
 
2 193 0.1%
 
3 129 0.1%
 
4 75 < 0.1%
 
ValueCountFrequency (%) 
99999 25 < 0.1%
 
95095 1 < 0.1%
 
75000 5 < 0.1%
 
70000 3 < 0.1%
 
66621 2 < 0.1%
 
Distinct count6
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Nonfiler
75094
Joint both under 65
67383
Single
37421
Joint both 65+
 
8332
Head of household
 
7426
ValueCountFrequency (%) 
Nonfiler 75094 37.6%
 
Joint both under 65 67383 33.8%
 
Single 37421 18.8%
 
Joint both 65+ 8332 4.2%
 
Head of household 7426 3.7%
 
Joint one under 65 & one 65+ 3867 1.9%
 

Length

Max length29
Mean length13.31297144
Min length7
ValueCountFrequency (%) 
Lowercase_Letter 15 62.5%
 
Uppercase_Letter 4 16.7%
 
Decimal_Number 2 8.3%
 
Other_Punctuation 1 4.2%
 
Space_Separator 1 4.2%
 
Math_Symbol 1 4.2%
 
ValueCountFrequency (%) 
Latin 19 79.2%
 
Common 5 20.8%
 
ValueCountFrequency (%) 
ASCII 24 100.0%
 

tax_filer_stat
Categorical

HIGH CORRELATION
Distinct count6
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe
183750
South
 
4889
West
 
4074
Midwest
 
3575
Northeast
 
2705
ValueCountFrequency (%) 
Not in universe 183750 92.1%
 
South 4889 2.5%
 
West 4074 2.0%
 
Midwest 3575 1.8%
 
Northeast 2705 1.4%
 
Abroad 530 0.3%
 

Length

Max length16
Mean length15.28176701
Min length5
ValueCountFrequency (%) 
Lowercase_Letter 14 70.0%
 
Uppercase_Letter 5 25.0%
 
Space_Separator 1 5.0%
 
ValueCountFrequency (%) 
Latin 19 95.0%
 
Common 1 5.0%
 
ValueCountFrequency (%) 
ASCII 20 100.0%
 

region_of_previous_residence
Categorical

HIGH CORRELATION
Distinct count50
Unique (%)< 0.1%
Missing708
Missing (%)0.4%
Memory size1.5 MiB
Not in universe
183750
California
 
1714
Utah
 
1063
Florida
 
849
North Carolina
 
812
Other values (45)
 
10627
ValueCountFrequency (%) 
Not in universe 183750 92.1%
 
California 1714 0.9%
 
Utah 1063 0.5%
 
Florida 849 0.4%
 
North Carolina 812 0.4%
 
Abroad 671 0.3%
 
Oklahoma 626 0.3%
 
Minnesota 576 0.3%
 
Indiana 533 0.3%
 
North Dakota 499 0.3%
 
Other values (40) 7722 3.9%
 
(Missing) 708 0.4%
 

Length

Max length21
Mean length15.46042311
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 24 53.3%
 
Uppercase_Letter 20 44.4%
 
Space_Separator 1 2.2%
 
ValueCountFrequency (%) 
Latin 44 97.8%
 
Common 1 2.2%
 
ValueCountFrequency (%) 
ASCII 45 100.0%
 

state_of_previous_residence
Categorical

HIGH CORRELATION
Distinct count38
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Householder
53248
Child <18 never marr not in subfamily
50326
Spouse of householder
41695
Nonfamily householder
22213
Child 18+ never marr Not in a subfamily
12030
Other values (33)
20011
ValueCountFrequency (%) 
Householder 53248 26.7%
 
Child <18 never marr not in subfamily 50326 25.2%
 
Spouse of householder 41695 20.9%
 
Nonfamily householder 22213 11.1%
 
Child 18+ never marr Not in a subfamily 12030 6.0%
 
Secondary individual 6122 3.1%
 
Other Rel 18+ ever marr not in subfamily 1956 1.0%
 
Grandchild <18 never marr child of subfamily RP 1868 0.9%
 
Other Rel 18+ never marr not in subfamily 1728 0.9%
 
Grandchild <18 never marr not in subfamily 1066 0.5%
 
Other values (28) 7271 3.6%
 

Length

Max length48
Mean length25.71388762
Min length12
ValueCountFrequency (%) 
Lowercase_Letter 21 60.0%
 
Uppercase_Letter 9 25.7%
 
Decimal_Number 2 5.7%
 
Math_Symbol 2 5.7%
 
Space_Separator 1 2.9%
 
ValueCountFrequency (%) 
Latin 30 85.7%
 
Common 5 14.3%
 
ValueCountFrequency (%) 
ASCII 35 100.0%
 

detailed_household_and_family_stat
Categorical

HIGH CORRELATION
Distinct count8
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Householder
75475
Child under 18 never married
50426
Spouse of householder
41709
Child 18 or older
 
14430
Other relative of householder
 
9703
Other values (3)
 
7780
ValueCountFrequency (%) 
Householder 75475 37.8%
 
Child under 18 never married 50426 25.3%
 
Spouse of householder 41709 20.9%
 
Child 18 or older 14430 7.2%
 
Other relative of householder 9703 4.9%
 
Nonrelative of householder 7601 3.8%
 
Group Quarters- Secondary individual 132 0.1%
 
Child under 18 ever married 47 < 0.1%
 

Length

Max length37
Mean length20.28793172
Min length12
ValueCountFrequency (%) 
Lowercase_Letter 18 62.1%
 
Uppercase_Letter 7 24.1%
 
Decimal_Number 2 6.9%
 
Space_Separator 1 3.4%
 
Dash_Punctuation 1 3.4%
 
ValueCountFrequency (%) 
Latin 25 86.2%
 
Common 4 13.8%
 
ValueCountFrequency (%) 
ASCII 29 100.0%
 
Distinct count99800
Unique (%)50.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1740.3802692922618
Minimum37.87
Maximum18656.3
Zeros0
Zeros (%)0.0%
Memory size1.5 MiB

Quantile statistics

Minimum37.87
5-th percentile395.342
Q11061.615
median1618.31
Q32188.61
95-th percentile3585.909
Maximum18656.3
Range18618.43
Interquartile range (IQR)1126.995

Descriptive statistics

Standard deviation993.7681558
Coefficient of variation (CV)0.5710063331
Kurtosis5.412514036
Mean1740.380269
Median Absolute Deviation (MAD)741.3872327
Skewness1.432733152
Sum347245892.5
Variance987575.1475
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 37.87 92.5 116.535 148.965 182.24 ... 6424.78 7236.485 9270.445 12071.45 18656.3 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1601.4 32 < 0.1%
 
753.23 32 < 0.1%
 
1191.21 32 < 0.1%
 
1787.34 32 < 0.1%
 
707.9 31 < 0.1%
 
1317.51 31 < 0.1%
 
1070.15 30 < 0.1%
 
1839.19 28 < 0.1%
 
1002.02 28 < 0.1%
 
1009.39 28 < 0.1%
 
Other values (99790) 199219 99.8%
 
ValueCountFrequency (%) 
37.87 1 < 0.1%
 
39.11 1 < 0.1%
 
40.67 2 < 0.1%
 
42.82 2 < 0.1%
 
43.26 3 < 0.1%
 
ValueCountFrequency (%) 
18656.3 1 < 0.1%
 
16349.2 1 < 0.1%
 
13911.5 1 < 0.1%
 
13145.1 1 < 0.1%
 
13114.2 1 < 0.1%
 

migration_code-change_in_msa
Categorical

HIGH CORRELATION
MISSING
Distinct count9
Unique (%)< 0.1%
Missing99696
Missing (%)50.0%
Memory size1.5 MiB
Nonmover
82538
MSA to MSA
 
10601
NonMSA to nonMSA
 
2811
Not in universe
 
1516
MSA to nonMSA
 
790
Other values (4)
 
1571
ValueCountFrequency (%) 
Nonmover 82538 41.4%
 
MSA to MSA 10601 5.3%
 
NonMSA to nonMSA 2811 1.4%
 
Not in universe 1516 0.8%
 
MSA to nonMSA 790 0.4%
 
NonMSA to MSA 615 0.3%
 
Abroad to MSA 453 0.2%
 
Not identifiable 430 0.2%
 
Abroad to nonMSA 73 < 0.1%
 
(Missing) 99696 50.0%
 

Length

Max length17
Mean length6.340857946
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 15 75.0%
 
Uppercase_Letter 4 20.0%
 
Space_Separator 1 5.0%
 
ValueCountFrequency (%) 
Latin 19 95.0%
 
Common 1 5.0%
 
ValueCountFrequency (%) 
ASCII 20 100.0%
 

migration_code-change_in_reg
Categorical

HIGH CORRELATION
MISSING
Distinct count8
Unique (%)< 0.1%
Missing99696
Missing (%)50.0%
Memory size1.5 MiB
Nonmover
82538
Same county
 
9812
Different county same state
 
2797
Not in universe
 
1516
Different region
 
1178
Other values (3)
 
1986
ValueCountFrequency (%) 
Nonmover 82538 41.4%
 
Same county 9812 4.9%
 
Different county same state 2797 1.4%
 
Not in universe 1516 0.8%
 
Different region 1178 0.6%
 
Different state same division 991 0.5%
 
Abroad 530 0.3%
 
Different division same region 465 0.2%
 
(Missing) 99696 50.0%
 

Length

Max length31
Mean length6.666534685
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 17 77.3%
 
Uppercase_Letter 4 18.2%
 
Space_Separator 1 4.5%
 
ValueCountFrequency (%) 
Latin 21 95.5%
 
Common 1 4.5%
 
ValueCountFrequency (%) 
ASCII 22 100.0%
 

migration_code-move_within_reg
Categorical

HIGH CORRELATION
MISSING
Distinct count9
Unique (%)< 0.1%
Missing99696
Missing (%)50.0%
Memory size1.5 MiB
Nonmover
82538
Same county
 
9812
Different county same state
 
2797
Not in universe
 
1516
Different state in South
 
973
Other values (4)
 
2191
ValueCountFrequency (%) 
Nonmover 82538 41.4%
 
Same county 9812 4.9%
 
Different county same state 2797 1.4%
 
Not in universe 1516 0.8%
 
Different state in South 973 0.5%
 
Different state in West 679 0.3%
 
Different state in Midwest 551 0.3%
 
Abroad 530 0.3%
 
Different state in Northeast 431 0.2%
 
(Missing) 99696 50.0%
 

Length

Max length29
Mean length6.685710419
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 18 72.0%
 
Uppercase_Letter 6 24.0%
 
Space_Separator 1 4.0%
 
ValueCountFrequency (%) 
Latin 24 96.0%
 
Common 1 4.0%
 
ValueCountFrequency (%) 
ASCII 25 100.0%
 

live_in_this_house_1_year_ago
Categorical

HIGH CORRELATION
Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe under 1 year old
101212
Yes
82538
No
 
15773
ValueCountFrequency (%) 
Not in universe under 1 year old 101212 50.7%
 
Yes 82538 41.4%
 
No 15773 7.9%
 

Length

Max length33
Mean length18.63177178
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 13 76.5%
 
Uppercase_Letter 2 11.8%
 
Decimal_Number 1 5.9%
 
Space_Separator 1 5.9%
 
ValueCountFrequency (%) 
Latin 15 88.2%
 
Common 2 11.8%
 
ValueCountFrequency (%) 
ASCII 17 100.0%
 

migration_prev_res_in_sunbelt
Categorical

HIGH CORRELATION
MISSING
Distinct count3
Unique (%)< 0.1%
Missing99696
Missing (%)50.0%
Memory size1.5 MiB
Not in universe
84054
No
 
9987
Yes
 
5786
ValueCountFrequency (%) 
Not in universe 84054 42.1%
 
No 9987 5.0%
 
Yes 5786 2.9%
 
(Missing) 99696 50.0%
 

Length

Max length16
Mean length8.505570786
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 10 76.9%
 
Uppercase_Letter 2 15.4%
 
Space_Separator 1 7.7%
 
ValueCountFrequency (%) 
Latin 12 92.3%
 
Common 1 7.7%
 
ValueCountFrequency (%) 
ASCII 13 100.0%
 

num_persons_worked_for_employer
Real number (ℝ≥0)

ZEROS
Distinct count7
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.9561804904697704
Minimum0
Maximum6
Zeros95983
Zeros (%)48.1%
Memory size1.5 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q34
95-th percentile6
Maximum6
Range6
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.365125505
Coefficient of variation (CV)1.209052803
Kurtosis-1.082246833
Mean1.95618049
Median Absolute Deviation (MAD)2.103581512
Skewness0.7515606804
Sum390303
Variance5.593818657
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 1.5 2.5 3.5 4.5 5.5 6. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 95983 48.1%
 
6 36511 18.3%
 
1 23109 11.6%
 
4 14379 7.2%
 
3 13425 6.7%
 
2 10081 5.1%
 
5 6035 3.0%
 
ValueCountFrequency (%) 
0 95983 48.1%
 
1 23109 11.6%
 
2 10081 5.1%
 
3 13425 6.7%
 
4 14379 7.2%
 
ValueCountFrequency (%) 
6 36511 18.3%
 
5 6035 3.0%
 
4 14379 7.2%
 
3 13425 6.7%
 
2 10081 5.1%
 
Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe
144232
Both parents present
38983
Mother only present
 
12772
Father only present
 
1883
Neither parent present
 
1653
ValueCountFrequency (%) 
Not in universe 144232 72.3%
 
Both parents present 38983 19.5%
 
Mother only present 12772 6.4%
 
Father only present 1883 0.9%
 
Neither parent present 1653 0.8%
 

Length

Max length23
Mean length17.32869895
Min length16
ValueCountFrequency (%) 
Lowercase_Letter 14 73.7%
 
Uppercase_Letter 4 21.1%
 
Space_Separator 1 5.3%
 
ValueCountFrequency (%) 
Latin 18 94.7%
 
Common 1 5.3%
 
ValueCountFrequency (%) 
ASCII 19 100.0%
 

country_of_birth_father
Categorical

MISSING
Distinct count42
Unique (%)< 0.1%
Missing6713
Missing (%)3.4%
Memory size1.5 MiB
United-States
159163
Mexico
 
10008
Puerto-Rico
 
2680
Italy
 
2212
Canada
 
1380
Other values (37)
 
17367
ValueCountFrequency (%) 
United-States 159163 79.8%
 
Mexico 10008 5.0%
 
Puerto-Rico 2680 1.3%
 
Italy 2212 1.1%
 
Canada 1380 0.7%
 
Germany 1356 0.7%
 
Dominican-Republic 1290 0.6%
 
Poland 1212 0.6%
 
Philippines 1154 0.6%
 
Cuba 1125 0.6%
 
Other values (32) 11230 5.6%
 
(Missing) 6713 3.4%
 

Length

Max length29
Mean length12.70240524
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 21 45.7%
 
Uppercase_Letter 20 43.5%
 
Open_Punctuation 1 2.2%
 
Other_Punctuation 1 2.2%
 
Close_Punctuation 1 2.2%
 
Space_Separator 1 2.2%
 
Dash_Punctuation 1 2.2%
 
ValueCountFrequency (%) 
Latin 41 89.1%
 
Common 5 10.9%
 
ValueCountFrequency (%) 
ASCII 46 100.0%
 

country_of_birth_mother
Categorical

MISSING
Distinct count42
Unique (%)< 0.1%
Missing6119
Missing (%)3.1%
Memory size1.5 MiB
United-States
160479
Mexico
 
9781
Puerto-Rico
 
2473
Italy
 
1844
Canada
 
1451
Other values (37)
 
17376
ValueCountFrequency (%) 
United-States 160479 80.4%
 
Mexico 9781 4.9%
 
Puerto-Rico 2473 1.2%
 
Italy 1844 0.9%
 
Canada 1451 0.7%
 
Germany 1382 0.7%
 
Philippines 1231 0.6%
 
Poland 1110 0.6%
 
El-Salvador 1108 0.6%
 
Cuba 1108 0.6%
 
Other values (32) 11437 5.7%
 
(Missing) 6119 3.1%
 

Length

Max length29
Mean length12.75193837
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 21 45.7%
 
Uppercase_Letter 20 43.5%
 
Open_Punctuation 1 2.2%
 
Close_Punctuation 1 2.2%
 
Other_Punctuation 1 2.2%
 
Space_Separator 1 2.2%
 
Dash_Punctuation 1 2.2%
 
ValueCountFrequency (%) 
Latin 41 89.1%
 
Common 5 10.9%
 
ValueCountFrequency (%) 
ASCII 46 100.0%
 

country_of_birth_self
Categorical

MISSING
Distinct count42
Unique (%)< 0.1%
Missing3393
Missing (%)1.7%
Memory size1.5 MiB
United-States
176989
Mexico
 
5767
Puerto-Rico
 
1400
Germany
 
851
Philippines
 
845
Other values (37)
 
10278
ValueCountFrequency (%) 
United-States 176989 88.7%
 
Mexico 5767 2.9%
 
Puerto-Rico 1400 0.7%
 
Germany 851 0.4%
 
Philippines 845 0.4%
 
Cuba 837 0.4%
 
Canada 700 0.4%
 
Dominican-Republic 690 0.3%
 
El-Salvador 689 0.3%
 
China 478 0.2%
 
Other values (32) 6884 3.5%
 
(Missing) 3393 1.7%
 

Length

Max length29
Mean length13.29676278
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 21 45.7%
 
Uppercase_Letter 20 43.5%
 
Open_Punctuation 1 2.2%
 
Other_Punctuation 1 2.2%
 
Close_Punctuation 1 2.2%
 
Space_Separator 1 2.2%
 
Dash_Punctuation 1 2.2%
 
ValueCountFrequency (%) 
Latin 41 89.1%
 
Common 5 10.9%
 
ValueCountFrequency (%) 
ASCII 46 100.0%
 

citizenship
Categorical

Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Native- Born in the United States
176992
Foreign born- Not a citizen of U S
 
13401
Foreign born- U S citizen by naturalization
 
5855
Native- Born abroad of American Parent(s)
 
1756
Native- Born in Puerto Rico or U S Outlying
 
1519
ValueCountFrequency (%) 
Native- Born in the United States 176992 88.7%
 
Foreign born- Not a citizen of U S 13401 6.7%
 
Foreign born- U S citizen by naturalization 5855 2.9%
 
Native- Born abroad of American Parent(s) 1756 0.9%
 
Native- Born in Puerto Rico or U S Outlying 1519 0.8%
 

Length

Max length44
Mean length34.57431975
Min length34
ValueCountFrequency (%) 
Lowercase_Letter 20 60.6%
 
Uppercase_Letter 9 27.3%
 
Open_Punctuation 1 3.0%
 
Close_Punctuation 1 3.0%
 
Space_Separator 1 3.0%
 
Dash_Punctuation 1 3.0%
 
ValueCountFrequency (%) 
Latin 29 87.9%
 
Common 4 12.1%
 
ValueCountFrequency (%) 
ASCII 33 100.0%
 
Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size195.1 KiB
0
180672
2
 
16153
1
 
2698
ValueCountFrequency (%) 
0 180672 90.6%
 
2 16153 8.1%
 
1 2698 1.4%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 3 100.0%
 
ValueCountFrequency (%) 
Common 3 100.0%
 
ValueCountFrequency (%) 
ASCII 3 100.0%
 
Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe
197539
No
 
1593
Yes
 
391
ValueCountFrequency (%) 
Not in universe 197539 99.0%
 
No 1593 0.8%
 
Yes 391 0.2%
 

Length

Max length16
Mean length15.87269137
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 9 75.0%
 
Uppercase_Letter 2 16.7%
 
Space_Separator 1 8.3%
 
ValueCountFrequency (%) 
Latin 11 91.7%
 
Common 1 8.3%
 
ValueCountFrequency (%) 
ASCII 12 100.0%
 
Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size195.1 KiB
2
150130
0
47409
1
 
1984
ValueCountFrequency (%) 
2 150130 75.2%
 
0 47409 23.8%
 
1 1984 1.0%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 3 100.0%
 
ValueCountFrequency (%) 
Common 3 100.0%
 
ValueCountFrequency (%) 
ASCII 3 100.0%
 

weeks_worked_in_year
Real number (ℝ≥0)

ZEROS
Distinct count53
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean23.174897129654227
Minimum0
Maximum52
Zeros95983
Zeros (%)48.1%
Memory size1.5 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median8
Q352
95-th percentile52
Maximum52
Range52
Interquartile range (IQR)52

Descriptive statistics

Standard deviation24.41148817
Coefficient of variation (CV)1.053359073
Kurtosis-1.863805826
Mean23.17489713
Median Absolute Deviation (MAD)23.68273979
Skewness0.2101693419
Sum4623925
Variance595.9207546
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 3.5 4.5 5.5 ... 48.5 49.5 50.5 51.5 52. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 95983 48.1%
 
52 70314 35.2%
 
40 2790 1.4%
 
50 2304 1.2%
 
26 2268 1.1%
 
48 1806 0.9%
 
12 1780 0.9%
 
30 1378 0.7%
 
20 1330 0.7%
 
8 1126 0.6%
 
Other values (43) 18444 9.2%
 
ValueCountFrequency (%) 
0 95983 48.1%
 
1 464 0.2%
 
2 458 0.2%
 
3 417 0.2%
 
4 757 0.4%
 
ValueCountFrequency (%) 
52 70314 35.2%
 
51 819 0.4%
 
50 2304 1.2%
 
49 509 0.3%
 
48 1806 0.9%
 

year
Categorical

HIGH CORRELATION
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
94
99827
95
99696
ValueCountFrequency (%) 
94 99827 50.0%
 
95 99696 50.0%
 

Length

Max length2
Mean length2
Min length2
ValueCountFrequency (%) 
Decimal_Number 3 100.0%
 
ValueCountFrequency (%) 
Common 3 100.0%
 
ValueCountFrequency (%) 
ASCII 3 100.0%
 

income
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size195.1 KiB
0
187141
1
 
12382
ValueCountFrequency (%) 
0 187141 93.8%
 
1 12382 6.2%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 2 100.0%
 
ValueCountFrequency (%) 
Common 2 100.0%
 
ValueCountFrequency (%) 
ASCII 2 100.0%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

ageclass_of_workerdetailed_industry_recodedetailed_occupation_recodeeducationwage_per_hourenroll_in__edu_inst_last_wkmarital_statmajor_industry_codemajor_occupation_coderacehispanic_originsexmember_of_a_labor_unionreason_for_unemploymentfull_or_part_time_employment_statcapital_gainscapital_lossesdividends_from_stocksfederal_income_tax_liabilitytax_filer_statregion_of_previous_residencestate_of_previous_residencedetailed_household_and_family_statdetailed_household_summary_in_householdmigration_code-change_in_msamigration_code-change_in_regmigration_code-move_within_reglive_in_this_house_1_year_agomigration_prev_res_in_sunbeltnum_persons_worked_for_employerfamily_members_under_18country_of_birth_fathercountry_of_birth_mothercountry_of_birth_selfcitizenshipown_business_or_self_employedfill_inc_questionnaire_for_veteran's_adminveterans_benefitsweeks_worked_in_yearyearincome
073Not in universe00High school graduate0Not in universeWidowedNot in universe or childrenNot in universeWhiteAll otherFemaleNot in universeNot in universeNot in labor force000NonfilerNot in universeNot in universeOther Rel 18+ ever marr not in subfamilyOther relative of householder1700.09NaNNaNNaNNot in universe under 1 year oldNaN0Not in universeUnited-StatesUnited-StatesUnited-StatesNative- Born in the United States0Not in universe20950
158Self-employed-not incorporated434Some college but no degree0Not in universeDivorcedConstructionPrecision production craft & repairWhiteAll otherMaleNot in universeNot in universeChildren or Armed Forces000Head of householdSouthArkansasHouseholderHouseholder1053.55MSA to MSASame countySame countyNoYes1Not in universeUnited-StatesUnited-StatesUnited-StatesNative- Born in the United States0Not in universe252940
218Not in universe0010th grade0High schoolNever marriedNot in universe or childrenNot in universeAsian or Pacific IslanderAll otherFemaleNot in universeNot in universeNot in labor force000NonfilerNot in universeNot in universeChild 18+ never marr Not in a subfamilyChild 18 or older991.95NaNNaNNaNNot in universe under 1 year oldNaN0Not in universeVietnamVietnamVietnamForeign born- Not a citizen of U S0Not in universe20950
39Not in universe00Children0Not in universeNever marriedNot in universe or childrenNot in universeWhiteAll otherFemaleNot in universeNot in universeChildren or Armed Forces000NonfilerNot in universeNot in universeChild <18 never marr not in subfamilyChild under 18 never married1758.14NonmoverNonmoverNonmoverYesNot in universe0Both parents presentUnited-StatesUnited-StatesUnited-StatesNative- Born in the United States0Not in universe00940
410Not in universe00Children0Not in universeNever marriedNot in universe or childrenNot in universeWhiteAll otherFemaleNot in universeNot in universeChildren or Armed Forces000NonfilerNot in universeNot in universeChild <18 never marr not in subfamilyChild under 18 never married1069.16NonmoverNonmoverNonmoverYesNot in universe0Both parents presentUnited-StatesUnited-StatesUnited-StatesNative- Born in the United States0Not in universe00940
548Private4010Some college but no degree1200Not in universeMarried-civilian spouse presentEntertainmentProfessional specialtyAmer Indian Aleut or EskimoAll otherFemaleNoNot in universeFull-time schedules000Joint both under 65Not in universeNot in universeSpouse of householderSpouse of householder162.61NaNNaNNaNNot in universe under 1 year oldNaN1Not in universePhilippinesUnited-StatesUnited-StatesNative- Born in the United States2Not in universe252950
642Private343Bachelors degree(BA AB BS)0Not in universeMarried-civilian spouse presentFinance insurance and real estateExecutive admin and managerialWhiteAll otherMaleNot in universeNot in universeChildren or Armed Forces517800Joint both under 65Not in universeNot in universeHouseholderHouseholder1535.86NonmoverNonmoverNonmoverYesNot in universe6Not in universeUnited-StatesUnited-StatesUnited-StatesNative- Born in the United States0Not in universe252940
728Private440High school graduate0Not in universeNever marriedConstructionHandlers equip cleaners etcWhiteAll otherFemaleNot in universeJob loser - on layoffUnemployed full-time000SingleNot in universeNot in universeSecondary individualNonrelative of householder898.83NaNNaNNaNNot in universe under 1 year oldNaN4Not in universeUnited-StatesUnited-StatesUnited-StatesNative- Born in the United States0Not in universe230950
847Local government4326Some college but no degree876Not in universeMarried-civilian spouse presentEducationAdm support including clericalWhiteAll otherFemaleNoNot in universeFull-time schedules000Joint both under 65Not in universeNot in universeSpouse of householderSpouse of householder1661.53NaNNaNNaNNot in universe under 1 year oldNaN5Not in universeUnited-StatesUnited-StatesUnited-StatesNative- Born in the United States0Not in universe252950
934Private437Some college but no degree0Not in universeMarried-civilian spouse presentConstructionMachine operators assmblrs & inspctrsWhiteAll otherMaleNot in universeNot in universeChildren or Armed Forces000Joint both under 65Not in universeNot in universeHouseholderHouseholder1146.79NonmoverNonmoverNonmoverYesNot in universe6Not in universeUnited-StatesUnited-StatesUnited-StatesNative- Born in the United States0Not in universe252940

Last rows

ageclass_of_workerdetailed_industry_recodedetailed_occupation_recodeeducationwage_per_hourenroll_in__edu_inst_last_wkmarital_statmajor_industry_codemajor_occupation_coderacehispanic_originsexmember_of_a_labor_unionreason_for_unemploymentfull_or_part_time_employment_statcapital_gainscapital_lossesdividends_from_stocksfederal_income_tax_liabilitytax_filer_statregion_of_previous_residencestate_of_previous_residencedetailed_household_and_family_statdetailed_household_summary_in_householdmigration_code-change_in_msamigration_code-change_in_regmigration_code-move_within_reglive_in_this_house_1_year_agomigration_prev_res_in_sunbeltnum_persons_worked_for_employerfamily_members_under_18country_of_birth_fathercountry_of_birth_mothercountry_of_birth_selfcitizenshipown_business_or_self_employedfill_inc_questionnaire_for_veteran's_adminveterans_benefitsweeks_worked_in_yearyearincome
19951357Private9379th grade0Not in universeDivorcedManufacturing-durable goodsMachine operators assmblrs & inspctrsWhiteCentral or South AmericanFemaleNot in universeNot in universeFull-time schedules000SingleNot in universeNot in universeHouseholderHouseholder743.66NaNNaNNaNNot in universe under 1 year oldNaN4Not in universeDominican-RepublicDominican-RepublicDominican-RepublicForeign born- Not a citizen of U S0Not in universe252950
19951451Private331910th grade0Not in universeWidowedRetail tradeSalesWhiteAll otherFemaleNot in universeNot in universeChildren or Armed Forces000SingleSouthNorth DakotaHouseholderHouseholder1302.34NonMSA to nonMSASame countySame countyNoYes6Not in universeUnited-StatesUnited-StatesUnited-StatesNative- Born in the United States0Not in universe252940
19951587Not in universe00High school graduate0Not in universeWidowedNot in universe or childrenNot in universeWhiteAll otherFemaleNot in universeNot in universeNot in labor force000SingleNot in universeNot in universeNonfamily householderHouseholder3255.80NaNNaNNaNNot in universe under 1 year oldNaN0Not in universeNaNUnited-StatesUnited-StatesNative- Born in the United States0Not in universe20950
1995163Not in universe00Children0Not in universeNever marriedNot in universe or childrenNot in universeBlackAll otherMaleNot in universeNot in universeChildren or Armed Forces000NonfilerSouthUtahChild under 18 of RP of unrel subfamilyNonrelative of householder2733.75MSA to MSASame countySame countyNoYes0Mother only presentUnited-StatesUnited-StatesUnited-StatesNative- Born in the United States0Not in universe00940
19951739Private4326Bachelors degree(BA AB BS)0Not in universeNever marriedEducationAdm support including clericalOtherMexican-AmericanMaleNoNot in universeFull-time schedules684900SingleNot in universeNot in universeNonfamily householderHouseholder908.14NaNNaNNaNNot in universe under 1 year oldNaN6Not in universeMexicoMexicoMexicoForeign born- Not a citizen of U S2Not in universe252950
19951887Not in universe007th and 8th grade0Not in universeMarried-civilian spouse presentNot in universe or childrenNot in universeWhiteAll otherMaleNot in universeNot in universeNot in labor force000Joint both 65+Not in universeNot in universeHouseholderHouseholder955.27NaNNaNNaNNot in universe under 1 year oldNaN0Not in universeCanadaUnited-StatesUnited-StatesNative- Born in the United States0Not in universe20950
19951965Self-employed-incorporated37211th grade0Not in universeMarried-civilian spouse presentBusiness and repair servicesExecutive admin and managerialWhiteAll otherMaleNot in universeNot in universeChildren or Armed Forces641809Joint one under 65 & one 65+Not in universeNot in universeHouseholderHouseholder687.19NonmoverNonmoverNonmoverYesNot in universe1Not in universeUnited-StatesUnited-StatesUnited-StatesNative- Born in the United States0Not in universe252940
19952047Not in universe00Some college but no degree0Not in universeMarried-civilian spouse presentNot in universe or childrenNot in universeWhiteAll otherMaleNot in universeNot in universeChildren or Armed Forces00157Joint both under 65Not in universeNot in universeHouseholderHouseholder1923.03NaNNaNNaNNot in universe under 1 year oldNaN6Not in universePolandPolandGermanyForeign born- U S citizen by naturalization0Not in universe252950
19952116Not in universe0010th grade0High schoolNever marriedNot in universe or childrenNot in universeWhiteAll otherFemaleNot in universeNot in universeNot in labor force000NonfilerNot in universeNot in universeChild <18 never marr not in subfamilyChild under 18 never married4664.87NaNNaNNaNNot in universe under 1 year oldNaN0Both parents presentUnited-StatesUnited-StatesUnited-StatesNative- Born in the United States0Not in universe20950
19952232Private4230High school graduate0Not in universeNever marriedMedical except hospitalOther serviceBlackAll otherFemaleNoNot in universeChildren or Armed Forces000SingleNot in universeNot in universeNonfamily householderHouseholder1830.11NonmoverNonmoverNonmoverYesNot in universe6Not in universeNaNNaNNaNForeign born- Not a citizen of U S0Not in universe252940